17 research outputs found
Enhancing Low-resolution Face Recognition with Feature Similarity Knowledge Distillation
In this study, we introduce a feature knowledge distillation framework to
improve low-resolution (LR) face recognition performance using knowledge
obtained from high-resolution (HR) images. The proposed framework transfers
informative features from an HR-trained network to an LR-trained network by
reducing the distance between them. A cosine similarity measure was employed as
a distance metric to effectively align the HR and LR features. This approach
differs from conventional knowledge distillation frameworks, which use the L_p
distance metrics and offer the advantage of converging well when reducing the
distance between features of different resolutions. Our framework achieved a 3%
improvement over the previous state-of-the-art method on the AgeDB-30 benchmark
without bells and whistles, while maintaining a strong performance on HR
images. The effectiveness of cosine similarity as a distance metric was
validated through statistical analysis, making our approach a promising
solution for real-world applications in which LR images are frequently
encountered. The code and pretrained models are publicly available on
https://github.com/gist-ailab/feature-similarity-KD
SleePyCo: Automatic Sleep Scoring with Feature Pyramid and Contrastive Learning
Automatic sleep scoring is essential for the diagnosis and treatment of sleep
disorders and enables longitudinal sleep tracking in home environments.
Conventionally, learning-based automatic sleep scoring on single-channel
electroencephalogram (EEG) is actively studied because obtaining multi-channel
signals during sleep is difficult. However, learning representation from raw
EEG signals is challenging owing to the following issues: 1) sleep-related EEG
patterns occur on different temporal and frequency scales and 2) sleep stages
share similar EEG patterns. To address these issues, we propose a deep learning
framework named SleePyCo that incorporates 1) a feature pyramid and 2)
supervised contrastive learning for automatic sleep scoring. For the feature
pyramid, we propose a backbone network named SleePyCo-backbone to consider
multiple feature sequences on different temporal and frequency scales.
Supervised contrastive learning allows the network to extract class
discriminative features by minimizing the distance between intra-class features
and simultaneously maximizing that between inter-class features. Comparative
analyses on four public datasets demonstrate that SleePyCo consistently
outperforms existing frameworks based on single-channel EEG. Extensive ablation
experiments show that SleePyCo exhibits enhanced overall performance, with
significant improvements in discrimination between the N1 and rapid eye
movement (REM) stages.Comment: 14 pages, 3 figures, 8 table
Block Selection Method for Using Feature Norm in Out-of-distribution Detection
Detecting out-of-distribution (OOD) inputs during the inference stage is
crucial for deploying neural networks in the real world. Previous methods
commonly relied on the output of a network derived from the highly activated
feature map. In this study, we first revealed that a norm of the feature map
obtained from the other block than the last block can be a better indicator of
OOD detection. Motivated by this, we propose a simple framework consisting of
FeatureNorm: a norm of the feature map and NormRatio: a ratio of FeatureNorm
for ID and OOD to measure the OOD detection performance of each block. In
particular, to select the block that provides the largest difference between
FeatureNorm of ID and FeatureNorm of OOD, we create Jigsaw puzzle images as
pseudo OOD from ID training samples and calculate NormRatio, and the block with
the largest value is selected. After the suitable block is selected, OOD
detection with the FeatureNorm outperforms other OOD detection methods by
reducing FPR95 by up to 52.77% on CIFAR10 benchmark and by up to 48.53% on
ImageNet benchmark. We demonstrate that our framework can generalize to various
architectures and the importance of block selection, which can improve previous
OOD detection methods as well.Comment: 11 pages including reference. 5 figures and 5 table
INSTA-BEEER: Explicit Error Estimation and Refinement for Fast and Accurate Unseen Object Instance Segmentation
Efficient and accurate segmentation of unseen objects is crucial for robotic
manipulation. However, it remains challenging due to over- or
under-segmentation. Although existing refinement methods can enhance the
segmentation quality, they fix only minor boundary errors or are not
sufficiently fast. In this work, we propose INSTAnce Boundary Explicit Error
Estimation and Refinement (INSTA-BEEER), a novel refinement model that allows
for adding and deleting instances and sharpening boundaries. Leveraging an
error-estimation-then-refinement scheme, the model first estimates the
pixel-wise boundary explicit errors: true positive, true negative, false
positive, and false negative pixels of the instance boundary in the initial
segmentation. It then refines the initial segmentation using these error
estimates as guidance. Experiments show that the proposed model significantly
enhances segmentation, achieving state-of-the-art performance. Furthermore,
with a fast runtime (less than 0.1 s), the model consistently improves
performance across various initial segmentation methods, making it highly
suitable for practical robotic applications.Comment: 8 pages, 5 figure
Learning to Place Unseen Objects Stably using a Large-scale Simulation
Object placement is a fundamental task for robots, yet it remains challenging
for partially observed objects. Existing methods for object placement have
limitations, such as the requirement for a complete 3D model of the object or
the inability to handle complex shapes and novel objects that restrict the
applicability of robots in the real world. Herein, we focus on addressing the
Unseen Object Placement (UOP}=) problem. We tackled the UOP problem using two
methods: (1) UOP-Sim, a large-scale dataset to accommodate various shapes and
novel objects, and (2) UOP-Net, a point cloud segmentation-based approach that
directly detects the most stable plane from partial point clouds. Our UOP
approach enables robots to place objects stably, even when the object's shape
and properties are not fully known, thus providing a promising solution for
object placement in various environments. We verify our approach through
simulation and real-world robot experiments, demonstrating state-of-the-art
performance for placing single-view and partial objects. Robot demos, codes,
and dataset are available at https://gistailab.github.io/uop/Comment: 8 pages (main
Deep Learning Based Detection of Missing Tooth Regions for Dental Implant Planning in Panoramic Radiographic Images
Dental implantation is a surgical procedure in oral and maxillofacial surgery. Detecting missing tooth regions is essential for planning dental implant placement. This study proposes an automated method that detects regions of missing teeth in panoramic radiographic images. Tooth instance segmentation is required to accurately detect a missing tooth region in panoramic radiographic images containing obstacles, such as dental appliances or restoration. Therefore, we constructed a dataset that contains 455 panoramic radiographic images and annotations for tooth instance segmentation and missing tooth region detection. First, the segmentation model segments teeth into the panoramic radiographic image and generates teeth masks. Second, a detection model uses the teeth masks as input to predict regions of missing teeth. Finally, the detection model identifies the position and number of missing teeth in the panoramic radiographic image. We achieved 92.14% mean Average Precision (mAP) for tooth instance segmentation and 59.09% mAP for missing tooth regions detection. As a result, this method assists diagnosis by clinicians to detect missing teeth regions for implant placement
Deep Learning Based Detection of Missing Tooth Regions for Dental Implant Planning in Panoramic Radiographic Images
Dental implantation is a surgical procedure in oral and maxillofacial surgery. Detecting missing tooth regions is essential for planning dental implant placement. This study proposes an automated method that detects regions of missing teeth in panoramic radiographic images. Tooth instance segmentation is required to accurately detect a missing tooth region in panoramic radiographic images containing obstacles, such as dental appliances or restoration. Therefore, we constructed a dataset that contains 455 panoramic radiographic images and annotations for tooth instance segmentation and missing tooth region detection. First, the segmentation model segments teeth into the panoramic radiographic image and generates teeth masks. Second, a detection model uses the teeth masks as input to predict regions of missing teeth. Finally, the detection model identifies the position and number of missing teeth in the panoramic radiographic image. We achieved 92.14% mean Average Precision (mAP) for tooth instance segmentation and 59.09% mAP for missing tooth regions detection. As a result, this method assists diagnosis by clinicians to detect missing teeth regions for implant placement
BattleSound: A Game Sound Benchmark for the Sound-Specific Feedback Generation in a Battle Game
A haptic sensor coupled to a gamepad or headset is frequently used to enhance the sense of immersion for game players. However, providing haptic feedback for appropriate sound effects involves specialized audio engineering techniques to identify target sounds that vary according to the game. We propose a deep learning-based method for sound event detection (SED) to determine the optimal timing of haptic feedback in extremely noisy environments. To accomplish this, we introduce the BattleSound dataset, which contains a large volume of game sound recordings of game effects and other distracting sounds, including voice chats from a PlayerUnknown’s Battlegrounds (PUBG) game. Given the highly noisy and distracting nature of war-game environments, we set the annotation interval to 0.5 s, which is significantly shorter than the existing benchmarks for SED, to increase the likelihood that the annotated label contains sound from a single source. As a baseline, we adopt mobile-sized deep learning models to perform two tasks: weapon sound event detection (WSED) and voice chat activity detection (VCAD). The accuracy of the models trained on BattleSound was greater than 90% for both tasks; thus, BattleSound enables real-time game sound recognition in noisy environments via deep learning. In addition, we demonstrated that performance degraded significantly when the annotation interval was greater than 0.5 s, indicating that the BattleSound with short annotation intervals is advantageous for SED applications that demand real-time inferences